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REVERSED RESIDUALS IN AUTOREGRESSIVE 
TIME SERIES ANALYSIS 
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A. J. Lawrance 

School of Mathematics and Statistics 
University of Birmingham 
Birmingham B15 2TT, U.K. 

P. A. W. Lewis 

Department of Operations Research 
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1 Introduction 

Both linear and nonlinear time series can have directional features, features which 
indicate that the series do not maintain identical statistical properties when the 
direction on the time scale is reversed. The main purpose of the present paper is 
to develop the analysis of these features and to indicate and illustrate how they 
can be used for the investigation and modelling of linear or nonlinear autoregressive 
statistical models. In particular, the primary aim of the paper is to introduce the 
idea of reversed residuals (mentioned in discussion to Green (1984) and Lawrance 
and Lewis (1985)), and to develop some of their properties. Reversed residuals in 
general are residuals which would have been obtained if time ran in the opposite 
direction and arise naturally when considering partial autocorrelations for time se- 
ries; cross correlations of residuals and squared-reversed residuals allow extensions 
of model identification ideas given in Lawrance and Lewis (1986). Particular pairs 
of reversed and ordinary residuals are shown to produce partial autocorrelation co- 
efficients: quadratic types of partial autocorrelation coefficients are introduced to 
assess dependence associated with nonlinear models which nevertheless have linear 
autoregressive (Yule- Walker) correlation structures. 

A parallel theoretical study in this paper concerns the use of reversed residuals 
in the investigation of random coefficient autoregressive models (Andel, 1976, 1983; 
Vervaat, 1979; Nicholls and Quinn, 1982); the class includes the NEAR(l) models 
of Lawrance and Lewis (1981) with exponential marginals, the BGAR(l) models 
of Lewis et al (1989) with gamma marginals and the PBAR models of McKenzie 
(1985) with beta marginals. A result for all such models is given concerning a cut- 
off property of quadratic cross-correlations of the ordinary and reversed residuals: 
a partial reversibility condition in terms of moments for these models is also stated. 

The relevance of our concerns here with directionality is due to the fact that most 
nonlinear processes are directional, and amongst linear autoregressive processes, the 
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Deseasonalized Data 



Residuals 




Figure 1: Autocorrelation functions of deseasonalized Stour river data and the residuals 
from a first order autoregressive fit to that data. The bands define approximate confidence 
intervals on the autocorrelations for lag greater than zero on the assumption that the true 
autcorrelation function is zero for lags greater than zero. 

only non-directional or reversible ones are Gaussian. This latter fact forms part of 
a central result due to Weiss (1975) and indicates that use of reversed residuals is 
also of relevance to assessing the linearity of non Gaussian autoregressive processes. 

The use of reversed residuals is illustrated on a series of deseasonalized monthly 
British riverflow data in which it is shown that there is some nonlinear first or- 
der autoregressive dependency. The data is monthly flow data of the River Stour, 
Stourport, Gloucestershire, England, from 1918 to 1945; there are 444 data values 
in all. Following common hydrological practice, it has been deseasonalized by stan- 
dardizing each value by subtracting its own monthly mean and dividing by its own 
monthly standard deviation. This is effective in reducing the flows to a stationary 
series; the marginal distribution of the deseasonalized data is non Gaussian and 
very positively skewed. The autocorrelation is reasonably geometric as can be seen 
from the left hand panel in Figure 1, and the partial autocorrelation function (not 
shown) cuts-off at lag 2 as expected. Furthermore, the fit of a first order linear 
autoregressive model is satisfactory as judged by the autocorrelation function of its 
residuals, shown in the right hand panel of Figure 1. More formally, testing shows 
that the cumulated periodogram of the residuals is consistent with an hypothesis of 
uncorrelated residuals. This result is confirmed by modified Box-Pierce chi-square 
statistics at 12, 24, 36 and 48 lags. 

Reversed autoregressive residuals are introduced in the next section. The pur- 
pose of this paper is to see what further can be said about autoregressive fits, and 
in particular about the first order fit to the Stour river data, by using third and 
fourth joint properties of both the residuals and reversed residuals. For this pur- 
pose cross correlations of residuals and squared reversed residuals, as well as other 
combinations of the residuals, are examined. In addition, quadratic types of partial 
correlation coefficients are introduced. 
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2 REVERSED RESIDUALS 



Consider the time series {X t } which is stationary with mean zero; the p th order 
autoregressive residual sequence for this series is defined as 

R[ p ^ = X t — OL\X t -\ - ... — a p AT*_ p , t = 0, ±1, ±2, . . ; (1) 

where aq, a 2 , . . . , f* p are chosen to minimize 

E(X t — aiXt-i — — ... — a p Xt- p ) 2 . ( 2 ) 

If the true model for {X t } is autoregressive of order p, then ai,a 2 ,. ..,a p are its 
coefficients and Rf is its innovation term; in general ai,a 2 ,...,a p are given by p 
linear equations of the Yule- Walker type involving the autocorrelations of {X t }. 
The reversed p th order autoregressive residuals of {X J are correspondingly defined, 
with the index t increasing, as 

= X t — PiX t +i - /3 2 ATf +2 - ... - /? p X*+ p t = 0, dhl, ±2, ... (3) 

and /J 1 ,/J 2 , . . . ,/? p minimizing 

E(X t — P x Xt+i — P 2 X t +? — ... — /? p Yf+ p ) 2 . (4) 

The /?’s thus satisfy an identical set of linear equations to those of , a 2 , . . . , a p . 
Calculation of the residuals and reversed residuals will be based on ordinary least 
squares regression rather than the nearly equivalent, Gaussian theory, first order 
autoregressive likelihood. 

Suppose the available data series, after correction for its mean x is given by 
(xi , x 2 It is convenient when dealing with p th order residuals to form the 
display 



Xi 


*2 


x 3 
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X P+ 1 
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• • • X p+ 1 
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x n—p 


x n-p+ 1 


x n— 2 


x n— 1 


%n—p 


x n— p-f 1 


^n-p+ 2 
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The p th order reversed residuals are then formed as the residuals from regressing 
the first column on the p subsequent columns; similarly, p th order ordinary residuals 
come from regressing the (p + l) </l column on the p previous columns. In this way, 
the coefficients ai,a 2 , . . . ,a p in (1) and /? 1 ,/J 2 , . . . ,/? p in (3) are implicitly estimated, 
but never needed explicitly: they will not be estimated as equal, which theoretically 
they are, in order to make most use of the available data. An alternative procedure 
would be to use one or a combination of the two estimated sets for both residuals 
and reversed residuals ; there are implications of these comments to the calculation 
of partial correlation coefficients, to be given in Section 5. 

An alternative procedure, mentioned initially could be to assume a standard p th 
order linear Gaussian autoregressive model for the {A'*}; however, there is no real 
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Figure 2: Histogram for the ordinary residuals, and empirical quantile-quantile plot for 

the ordinary and reversed residuals for the Stour data. 

justification for this, except that the maximum likelihood estimates of the coefficients 
would correspond very closely to the regression estimates of oq,# 2 ,. . . ,a p . Fitting 
to the reversed series would similarly give estimates of the parameters /?i,/? 2 > • • • > P p - 
The structure of reversed residuals is rather complicated even in the situation 
when the order considered is that of the true linear autoregressive model; they do 
not have the same distribution as that of the ordinary residuals and they are not 
serially independent. It is straightforward to see, however, as in Lawrance and Lewis 
(1986) for the ordinary residuals, that the reversed residuals are uncorrelated un- 
der the linearly correct order of autoregression, that is Yule- Walker autoregression; 
moreover, they have the usual cut off property at the correct order of autoregressions 
as associated with the ordinary residuals of autoregressive models. The cross cor- 
relation function of the ordinary and reversed residuals is also zero at negative lags 
less than the degree of Yule-Walker autoregression. These features are all evident 
in the corresponding plots of the Stour data; a histogram of the ordinary residuals 
is given in the right hand panel in Fig 2. The distributions of both the ordinary and 
reversed residuals for these data are compared in the empirical quantile-quantile 
plot shown in the left hand panel in Figure 2. The distributions differ significantly 
in the tails. Moreover, since the average of each data set is approximately zero, it 
is seen that both data sets are positively skewed and thus non-normal. 

To illustrate the structure of reversed residuals consider the linear AR( 1 ) model 
given by 

Xt = pXt-1 + e t . (5) 

Then 

RR^ = Xt - pXt+i = (1 - P 2 ) X t - pc t+l 

= ~PU + 1 + (1 - P 2 ) | € t + Yfi T €t-r 

l r= 1 

It is seen that for this linear AR(1) model RR\ 1 ^ is dependent on e t +i , € t -\ , . . . 

but that it will be independent of £*+ 2 ,^+ 3 , . . . , that is, independent of the ordinary 
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residuals However, the sequence itself is dependent at all 

lags, unless the {c*} are Gaussian. 

The main motivation for reversed residuals in general is that they capture di- 
rectionality in a meaningful manner and allow model validation or criticism to be 
extended both beyond standard linear methods and differently from higher order 
methods involving ordinary residuals. The relevant class of models will naturally 
include a Yule-Walker autoregressive aspect, but will in general be nonlinear. The 
correct order of this linear autoregression is assumed, as determined by conven- 
tional means. The autoregressive aspect then leads to uncorrelated reversed resid- 
uals and the suggestion is to obtain measures of their higher order dependency 
which can be compared with corresponding estimated quantities. We will suggest 
some “quadratic” correlations involving the squaring of residuals and reversed resid- 
uals. An assessment of reversibility can also be based on the quadratic correlation 
function. 

3 USEFUL QUANTITIES IN TERMS OF RESIDU- 
ALS AND REVERSED RESIDUALS 

In a previous paper (Lawrance and Lewis, 1986), use was made of higher order 
correlations of the ordinary residuals as a means to help identify nonlinear autore- 
gressive processes which, nevertheless, had Yule-Walker linear autoregressionss. In 

particular attention was directed at Cow (^J+ r ) j; for a p ih order standard 

linear autoregressive process these correlations are zero, and for a nonlinear process 
with p th order Yule- Walker autoregressions the residuals are uncorrelated 

but dependent. Thus the higher order correlations, which will be called quadratic 
correlations when they involve squaring, — such as that just cited — give an assess- 
ment of the uncorrelated dependence; Granger and Anderson (1978) first considered 
the autocorrelations of squared residuals. As a data analysis tool, these correlations 
would be estimated for a range of lags for given p, the value of p having already 
been determined by a Yule-Walker linear analysis. The further analysis is then 
concerned with diagnosing nonlinear features in the autoregressions. Model based 
calculations of these quadratic correlations allow an assessment of whether the data 
is in reasonable agreement with the (non-linear) autoregressive model. 

With the introduction of reversed residuals there are three further possible pairs 
of quadratic correlations which could be examined, that is correlations of 

(£| p) , RR { t f r ) , (r[ p) \ 7^| p) r ) , (rR [ p) , RR[f r ) , r = ±1, ±2, . . . (7) 

It still seems to be a matter of experience which of these is most useful in model 
validation; when considering them for a range of positive and negative values of r, 
they are all basically functions of third moments of the form E (Yf) , E (X? Yt+ r ) 
and E (Y*Y*+ r Y*+,) . One possible guide to their use could be any special properties 
which might be exploited in relation to the type of model of interest; properties of 
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Figure 3: Autocorrelation function of squared ordinary and reversed residuals for the 

Stour data. The bands define approximate confidence intervals on the autocorrelations for 
lag greater than zero on the assumption that the true autcorrelation function is zero for lags 
greater than zero. 

these correlations for linear and random coefficient first order autoregressive models 
will be considered in Section 5. 

For the Stour data, the autocorrelation function of the squared ordinary and 
reversed residuals are given in Figure 3. The autocorrelation at lag 1 of the ordinary 
squared residuals is evidence here of the non-linearity in the series. The nonlinearity 
is also picked up in the autocorrelation function of the squared reversed residuals. A 
matrix of the third order cross-correlation functions, three of which are given in (7) 
is given in Figure 4 and will be interpreted against the theoretical results for linear 
and random coefficient models in Section 6. 

Another use of these quantities will be in connection with detecting directionality. 
For instance the marginal distributions of r \ p ^ and RR \ P ^ will be equal for reversible 
processes and the scatters of r[ p ^ and RR^ should be symmetric about the 45° 
line. If reversibility is restricted to relevant moment conditions then the correlation 
function of the first two pairs in (7) should be mirror images of each other; similarly, 
the last pair of (7) should equal the corresponding quantity in terms of ordinary 
residuals. 

It is seen from Figure 4 that for the Stour data there is no evidence for reversibil- 
ity. In particular, the plot in the upper right quadrant, even allowing for sampling 
variability, is different from that in the lower left quadrant. So are the plot in the 
upper left quadrant and the plot in the lower right quadrant, as predicted if there 
is non-linearity in the data. 



6 



Residuals vs. Residuals Squared 




Residuals Squared vs. 
Reversed Residuals 



Residuals vs. Reversed Residuals Squared 




Reversed Residuals vs. 
Reversed Residuals Squared 
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Figure 4: Crosscorrelation functions for the third order dependency of ordinary and 

reversed residuals for the Stour data. Bands are individual 95% confidence intervals for the 
crosscorrelation estimates under the hypothesis of independence between the two series. 

4 QUADRATIC TYPES OF PARTIAL CORRELA- 
TION COEFFICIENTS 

The introduction of reversed residuals allows the familiar partial autoregression 
function of time series analysis to be seen to some advantage. A discussion of the 
usual indirect definition of partial autocorrelations involving autoregressive models, 
and advocacy of the direct definition, is given in Lawrance (1979); it is pointed out 
there that a partial autocorrelation is the ordinary correlation between two random 
varaiables after their linear dependence on the partialed out random variables has 
been subtracted out. Thus taking the direct approach here, consider Xt and Xf+ P ; 
the p th partial autoregressions can be defined as the correlation between X t and X t + P 
- after each has been adjusted, in the least squares sense, for its linear dependence 
on the intervening X t +u Xt+ 2 , . . . ,A r *+ p _i . The adjusted X t and X t + P are then 
simply seen to be the reversed residual RR[ P ~ 1 ^ and the ordinary residuals R[+ p l \ 
respectively. In the partial correlation of Xt and X t + P notice that p(p > 2) is the 
lag and p — 1 is the order of the residuals used; the lag 1 partial autoregressions is, 
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by convention, the ordinary lag 1 autoregressions. Partial autoregressions will often 
be calculated or estimated for p = 2,3,... and used in determination of p, rather 
than with p held constant as we have previously suggested in higher order model 
validation. 

The earlier introduction of quadratic cross correlations suggest the use of two 
quadratic partial autoregressions of the form 

Corr ( 4 +; 0 ) 2 } , Corr j ( r ^) 2 , , p = 2, 3, ... . (8) 

As with partial autoregressions for standard linear autoregressive models, both these 
quadratic partial autoregressions will have cut-offs at the true autoregressive order; 
they will thus assess the linearity of the autoregressive structure, maybe indicating 
that a non-linear autoregressive model is required. For the Stour data, calculation 
of the quadratic partial correlations indicate no autoregressive dependency beyond 
lag one. At lag one ( p = 2 in (8) ), the first cross correlation has a value of -0.121. 
For p = 2, the cross correlations of (8) are effectively zero. The cut-off property 
does not depend on the direction of autoregression. 

The computation of all the partial autocorrelation functions follows directly from 
the least squares calculations described in Section 3 to obtain the ordinary and re- 
versed residuals. This method of calculating the standard partial autoregressions 
will usually produce slightly different results from the traditional method of taking 
the last estimated coefficients in a linear Gaussian autoregressive model. It is equiv- 
alent to estimating only the a-set of coefficients of the ordinary residuals and using 
these also to construct the reversed residuals. 

5 REVERSED RESIDUALS FOR RANDOM COEF- 
FICIENT AUTOREGRESSIVE MODELS 

A simple nonlinear generalization of standard linear autoregressive models is to allow 
the coefficients to be random variables; the general first order form of these models 
is then 

X t = AtXt-i + B t , t = 0,±1,±2,... (9) 

where {X t } has mean p and {A t ,B t } are independent pairs of possibly dependent 
random variables. A number of particular models fit into this class, for instance, 
the exponential models of Lawrance and Lewis (1981), the gamma models of Lewis 
et al (1989) and the beta models of McKenzie (1985). Currently these models are 
well developed for simulation use but are less developed in their statistical aspects. 
Lawrance and Lewis (1985, 1986) considered residuals analysis based on ordinary 
residuals for this type of model, and here attention is directed at reversed residuals 
aspects. 

Denoting the first two moments of {A t } by a and a 2 , the first order (p = 1) 
residuals are given by 

R t = (Xt-n)-a(X t - ,-p), RR t = {X t -n)-a{Xt +i -n). (10) 
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Some simple calculations show that the cross correlations function of these residuals 
is given for the first-order autoregressive random coefficient model by 



Corr (R u RR t - r ) = 



(1 — a 2 )a' r \ 
—a 
0 



r < 0 
r — 1 
r > 2 



(ii) 



The form of this result has been anticipated in Section 3, but of more interest are the 
quadratic cross correlation functions (7). For (i? 2 , the following results can 

be obtained; the computations are straightforward but very complex and tedious: 



Cov (R?,RR t -r) 

= (a 2 - a 2 )(a 2 - a) {/is + k / (a 2 -- a)} a r 2 ~ 2 , r> 2 ( 12 ) 

= [(l — a + a 3 - a 5 ) /i 3 - 2a (l - a 2 ) (a 2 /i 3 + k ) j r < - 1 (13) 

= (l + a 2 - a 3 - a 5 ) /x 3 - 2a (l - a 2 ) (a 2 /* 3 + k) , r = 0 (14) 

= (-a - a 2 - a 4 ) /z 3 + (l + 2a 2 ) (a 2 /* 3 + k ) , r = 1 (15) 

The further notation here is that /z 3 = E(X t - p) 3 ,k = 2a 2 {/iXuar(A*)+Cou(A*, i?*)} 
and cr 2 = rar(Xf). This result is of interest because its geometric parameter for r > 2 
is a 2 and so the corresponding estimated correlation function would give a possible 
way to estimate the variance of the random coefficients A t ; further, as will be seen 
in Section 6, /i 3 + k/(a 2 — a) is a measure of the directionality of the model. An- 
other implication of (12) is that the cross correlations, for r > 2, are zero for linear 
processes. This is because var(At) = a 2 — a 2 = 0 and therefor a 2 = a 2 ; this is not 

so for r < 2, as shown by by (13)- (15). For the Stour data, Figure 4 ( lower left 

quadrant) shows that the cross correlations corresponding to (12) are all positive, 
and although small, provide further evidence against linearity of this data. 

For the p th order random coefficient autoregressive model, 

X t = A^Xt^i + Xt-2 + • • • + A^Xt-p + B t , (16) 

generalizing (9), there is a parallel to the residuals theorem in Lawrance and Lewis 

(1986). There is the following 

Theorem : With the random coefficient model (16) 

Corr =0 for r > p + 1. (17) 

Proof: This depends on the independence of the vector of coefficients 
a[ 2 \ . . . , a[ p \ B t ^ on previous X t 's. We have 



Rt 



Xt-n- a\(Xt - /*) - ... - a p (X t - p - fi) (18) 

(idj 1 ^ - aj)X ( _i 4 - - <*2) Xt - 2 + • • • + — «p) A{_ p 

+B t - (1 - ai - a 2 . • . - a p )fi (19) 
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where E (^t^) = a i- Multiplying (16) by 






i=i 



+ E 



B t - 1 



2 

and taking expectations gives 





( 20 ) 



Now RR^} r involves Xt- ry X t - r +u • • • > X t _( r ~ p y, when r > p+ 1 this sequence is 

independent of a[*\ as is X t -j. Since a\^ - aj has zero expectation, so does the 
first summation in (20), and the second is also zero by very similar arguments. The 
random coefficients can still be dependent within each t for the argument to remain 
valid. The theorem is proved since (20) is the covariance corresponding to (17). □ 



6 REVERSIBILITY FOR FIRST ORDER RANDOM 
COEFFICIENT AUTOREGRESSIVE MODELS 

The first order random coefficient autoregressive model (9) will be fully reversible 
when the joint distribution ( X t ,X t -\ ) is symmetric; this follows from results of 
McKenzie (1985) for first order Markov models. However, although this joint dis- 
tribution was obtained explicitly for the gamma model of Lewis et al (1989), thus 
establishing the reversibility of this process, the joint distribution cannot be obtained 
generally in an explicit enough form to yield a tractable condition. We thus revert 
to the simplest moment form of partial reversibility, defined by the requirement that 

E {(X, - m) (Xt-r - /X) 2 } = E {(X, - m) 2 (Xt-r - /*)}* r = 0,±l,±2,... (21) 

For the first order random coefficients model (9) expressions for these joint mo- 
ments are given in Lawrance and Lewis (1986, equations (4.5) for r > 1 and (4.13) 
for r < -1) as a\ ^ and a \ /13 + k(a r 2 — a r )/(a 2 — a), respectively. The equality of 
these directional moments, for all r, yields the condition 



H 3 + k/(a 2 - a) = 0. 



( 22 ) 



This condition simplifies a little when A t and B t are independent and can then 
be written as 



/13 = 2ficr 2 var(At) / {a( 1 - a) - var( A*)}. (23) 

Note in particular that the skewness of X t is non-zero unless var(A t ) = 0, as it is 
in the case of the linear model; the skewness can otherwise be zero when At and Bt 
are correlated. 

It can be seen from (12) that C orr (Rf , RR t - r ) for (r > 2) is zero when the 
process satisfies the reversibility condition (22); this could be a useful result in 
model validation. In this respect also, consideration of Corr (R^ R t - r ) &s derived 
in Lawrance and Lewis (1986, equation (4.14)), is relevant. This can be cast in the 
new form, for r > 1, of 
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Cov(R? t ,Rt-r) 




k 



02 — a 



} 



a 



r— 1 
2 



a(l — a)(l - a 2 ) 

«2 — <i 



k a r “ 1 . (24) 



Thus, under the reversibility condition (22), the first term is zero and Corr(iZ 2 , ith_ r ) 
is geometrically decreasing in the parameter a . It is such special simplifications 
which justify the consideration of more than one of the mathematically equivalent 
cross correlation functions described in Section 4. 
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